/*Create variable for the largest 3-digit NAICS year 2000 industry employment by county for clustering*/

local Data "Cluster3digit"

noi disp "Obtaining bank financial data..."
noi disp "Verifying that `Data'.dta does not already exist..."
* This returns an error if the file does not exist.
capture confirm file `Data'.dta

* If an error is returned, then the commands below are run.
if _rc == 601 {
	noi disp "Data not found.  Building data..."
	noi disp "Building `Data'.dta..."


	insheet using "$SourceData\3_digit_q2.csv", clear

	*Excluding Finance Industry and NAICS92, the latter because there is no data for large v. small
	drop if floor(industry/10)==52 | floor(industry/10)==92

	* Verify that there are no duplicates
	isid geography industry year 

	* Keep only if the first year observed for each county is 2000.
	keep if year==2000
	drop if emp==.
	
	bysort geography: egen rank = rank(-emp)
	keep if rank<2

	*Breaking ties
		set seed 98034
		generate u1 = runiform()
		bysort geography: egen rank2 = rank(u1)
		keep if rank2==1
	
	*Set Groups for clustering by largest 3-digit NAICS code
	
	egen cluster3 = group(industry)
	keep geography cluster3
	
	sort geography 
	
	save "$LocalData\\`Data'.dta", replace
	save "$LocalData\Archive\\`Data'`CurrentDate'.dta", replace

}
* This runs if no error was returned.
else noi disp "Data already exists."
noi etime
noi disp " "
